Language model for the web search task in a spoken dialogue system for children

نویسندگان

  • Jumpei Miyake
  • Shota Takeuchi
  • Hiromichi Kawanami
  • Hiroshi Saruwatari
  • Kiyohiro Shikano
چکیده

In this paper, we propose a method to improve the speech recognition accuracy for web search utterances to a spoken dialogue system. Speech data with a dialogue system are obtained by our speech-oriented information guidance system, ”Takemaru-kun” [1], which has been in operation at a public community center since November 2002. From the results of manual labeling of the utterances, child utterances account for about 80%. Most of the web search utterances are out-of-domain words, i.e. trendy words or proper nouns. In order to adapt it to a wider domain, we propose to expand the language model and the vocabulary by collecting from various web resources such as weblogs and open dictionaries. First, we analyze the characteristics of the adult and child web search utterances separately. Then, we make a comparative study of a variety of learning corpora for language model construction. Finally, comparison of the performance of the language models is conducted.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

Rapid Development Process of Spoken Dialogue Systems using Collaboratively Constructed Semantic Resources

We herein propose a method for the rapid development of a spoken dialogue system based on collaboratively constructed semantic resources and compare the proposed method with a conventional method that is based on a relational database. Previous development frameworks of spoken dialogue systems, which presuppose a relational database management system as a background application, require complex...

متن کامل

A bootstrapping approach for developing language model of new spoken dialogue systems by selecting web texts

This paper proposes a bootstrapping method of constructing statistical language models for new spoken dialogue systems by collecting and selecting sentences from the World Wide Web (WWW). To make effective search queries that cover the target domain in full detail, we exploit the document set described about the target domain as seeding data. An important issue is how to filter the retrieved We...

متن کامل

Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources

In generic automatic speech recognition (ASR) systems, typically, language models (LMs) are trained to work within a broad range of input conditions. ASR systems used in domainspecific spoken dialogue systems (SDSs) are more constrained in terms of content and style. A mismatch in content and/or style between training and operating conditions results in performance degradation for the dialogue ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008